Application No. 10/777,072 
November 20, 2007 
Page 2 



Listing of the Claims: 

1 . (Currently Amended) An apparatus for recognizing a biological named entity from 
biological literature based on united medical language system (UMLS), comprising: 

a resource construction unit for receiving metathesaurus from the UMLS and 
constructing a concept name database, a single name database and a category keyterm database, 
which are language resources to be used to recognize a named entity; 

a rule collection unit for receiving configured to receive each concept name stored in the 
concept name database, extracting features ef- extracts a feature from each of the concept names 
by using data stored in the single name database and the category keyterm database, and 
constructing construct a rule database by creating a rule used to recognize the named entity and 
filtering the rules by using the extracted features; and 

a literature input unit configured to receive a biological literature; and 

a named entity recognition unit configured to receive the biological literature from the 
literature input unit for receiving a biological literature, extracting and extract nouns and noun 
phrases candidate named entities from the biological literature and recognize named entities 
based upon the rules generated by the rule collection unit that are candidate named entities, 
applying the rules stored in the rule database to the nouns and the noun phrases, and recognizing 
the named entities . 

2. (Original) The apparatus of claim 1, wherein the resource construction unit extracts 
concept names from the metathesaurus of the UMLS, which is divided according to the semantic 
categories, to construct the concept names database, processes the concept name stored in the 
concept name database to extract single names and category keyterms, and constructs the single 
name database and the category keyterm database by using the extracted single names and 
category keyterms. 

3. (Original) The apparatus of claim 1, wherein the rule collection unit extracts the feature 
of a token constituting each of the concept names stored the concept name database, creates the 
rules by combining the extracted features, weights the rules, filters the weighted rules with a 
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threshold, and stores the filtered rules in the rule database. 

4. (Original) The apparatus of claim 1, wherein the named entity recognition unit extracts 
the candidate named entities from the literature provided through a literature input unit, extracts 
the feature of each of the tokens constituting the candidate named entity, creates a rule used to 
determine the candidate named entity by combining the extracted feature, compares the created 
rule with the rule stored in the rule database to extract an existing rule suitable for the candidate 
named entity, applies a weight value of each of the extracted rules and a heuristic used to 
determine a category of the named entity, determines a final semantic category for the candidate 
named entity, and recognizing the named entity. 

5. (Currently Amended) A method for recognizing a biological named entity from 
biological literature based on UMLS, the method comprising the steps of: 

(a) receiving metathesaurus from the UMLS. UMLS: 

(b) extracting concept names, single names and category keyterms , which are language 
resources to be used to recognize a named entities, and i 

(c) constructing a concept name database, a single name database and a category keyterm 
database; 

(d) constructing a database of rules based upon information stored within the concept 
name database, the single name database, and the category keyterm database; (b) extracting 
features of the concept names by using the language resources stored in each of the databases, 
constituting a rule for the extracted features, storing the constituted rule in a rule database; and 

(e) inputting a literature; 

(f) extracting candidate named entities from the literature; and 

(g) recognizing named entities from the candidate named entities based upon the rules 
applied against the single name and category keyterm databases (c) receiving a literature, 
extracting features of a candidate named entity, creating a rule used to determine the candidate 
named entity by combining the extracted f e atures, comparing the created rule with the rules 
stored in the rule database, and determining a final semantic category by using a result of 
comparison . 
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6. (Currently Amended) The method of claim 5, wherein the step (a) step (b) comprises 
the steps of: 

(a 1) (b-1) mapping information in MRCON table used to describe meaning of each 
string representing the concept name to information in MRSTY table used to describe a semantic 
category allocated to each concept name among tables included in the metathesaurus by using a 
mapping condition, and dividing data stored in the MRCON table according to each semantic 
category; 

la-2 Hb-2) extracting values in a string (STR) field of the MRCON table from result of 
dividing a concept set and storing the extracted values in the concept name database; 

fa-4 Hb-3) extracting single names from the concept name database and storing the 
extracted single names in the single name database; and 

(a 4 ) (b-4) extracting category keyterms from the concept name database and storing the 
extracted category keyterm in the category keyterms database. 

7. (Original) The method of claim 6, wherein in the mapping condition for mapping 
information in the MRCON table and the MRSTY table, if unique identifier for concept (CUI) of 
the MRCON table is identical to CUI of the MRSTY table, only data that the value of a language 
of term (LAT) field is "ENG" among the data in the MRCON table are divided into different sets 
from one another according to a value corresponding to unique identifier of semantic type (TUI) 
of the MRSTY table. 

8. (Currently Amended) The method of claim 6, wherein the step (a-4 4(b-4) comprises the 
steps of: 

calculating distribution in the semantic category where each word constituting the named 
entity appears most frequently by using the concept names stored in the concept name database; 
and filtering the words with a threshold. 



9. (Currently Amended) The method of claim 5, wherein the step (b) step (d) comprises 
the steps of: 
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(b l) (d-l) extracting the features from each of the concept names stored in the concept 
name database according to a token; and 

{b-2 4(d-2) constituting generating the rule by combining the tokens whose features are 
extracted, calculating weight value of the constituted rule, filtering the rules with their weight 
values, and storing the filtered rules in the rule database. 

10. (Currently Amended) The method of claim 9, wherein in the step fb- Bfd-l) , the feature 
of the tokens of each of the concept names stored in the concept name database is extracted using 
the features of the category keyterm, the single name and a capital letter expression, an 
alphanumeric, a special character, a preposition or conjunction, which are features defined to 
reflect characteristics of the biological named entity, and a subtype of each of the features. 

11. (Currently Amended) The method of claim 9, wherein the step fb- ^(d-2) comprises the 
steps of: 

receiving the result in which the concept name is tokenized and the features are extracted 
at the step (b l) (d-l) , and creating the rules as many as the number of combinations of subtypes 
according to the subtypes of the features of the token; and 

calculating appearance distribution of the rule in each category on all the created rules, 
filtering the rules with the threshold, and constructing the rule database. 

12. (Currently Amended) The method of claim 5, wherein the step-fe )steps (f) and (g) 
comprises the steps of: 

(c l) (f-l) extracting nouns and noun phrases, which are candidate named entities, from 
the inputted literature; 

(c 2) (g-l) extracting features of each token of a candidate named entity; 

(c 3) (g-2) combining the features extracted from each of the tokens of the candidate 
named entity, and creating the rule used to determine the candidate named entity; 

(e-44 (g-3) comparing the created rule with the rules stored in the rule database; and 

(e-£ Kg-4) determining the final semantic category of the candidate named entity. 
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13. (Currently Amended) The method of claim 12, wherein in the step (c 4 ), (g-3), existing 
rules suitable to determine the candidate named entity are extracted an existing rule by 
comparing the rule used to determine the candidate named entity with the rules stored in the rule 
database in manners of exact match, partial match and nested match. 

14. (Currently Amended) The method of claim 12, wherein in the step {e-£ Kg-4) , the final 
semantic category of the candidate named entity is determined using weight values of existing 
rules extracted at the step (c 4 ) (g-3) and a heuristic used to determine a category of the named 
entity, and outputted as a result of recognizing the named entity. 

15. (New) The method of claim 1, wherein the candidate named entities are nouns and noun 
phrases. 



